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Abstract. Living systems are often described utilizing informational analogies. An 
important open question is whether information is merely a useful conceptual metaphor, 
or intrinsic to the operation of biological systems. To address this question, we provide 
a rigorous case study of the informational architecture of two representative biological 
networks: the Boolean network model for the cell-cycle regulatory network of the fission 
yeast S. pombe [1] and that of the budding yeast S. cerevisiae [2]. We compare our 
results for these biological networks to the same analysis performed on ensembles of two 
different types of random networks. We show that both biological networks share features 
in common that are not shared by either ensemble. In particular, the biological networks 
in our study, on average, process more information than the random networks. They 
also exhibit a scaling relation in information transferred between nodes that distinguishes 
them from either ensemble: even when compared to the ensemble of random networks 
that shares important topological properties, such as a scale-free structure. We show 
that the most biologically distinct regime of this scaling relation is associated with the 
dynamics and function of the biological networks. Information processing in biological 
networks is therefore interpreted as an emergent property of topology {causal structure) 
and dynamics {function). These results demonstrate quantitatively how the informational 
architecture of biologically evolved networks can distinguish them from other classes of 
network architecture that do not share the same informational properties. 


1. Introduction 

Living systems are often described in terms of logic modules, information flows and com¬ 
putation [3]. Such informational language is utilized in fields as diverse as evolutionary 
biology [1], neuroscience [5], pattern formation [6], colonial decision making in eusocial 
insects [7], and protein-protein interaction networks [8j, to name just a few. Although 
informational analogies are widely applied, an important open question is whether infor¬ 
mation is intrinsic to the operation of biological systems or merely a useful conceptual 
metaphor mm- The debate over the ontological status of information in biology has 
far-reaching consequences, including implications for our understanding of whether bio¬ 
logical organization is fully reducible to the known laws of physics and chemistry or new 
“informational laws” necessitating a foundational status for information in physical theory 
are necessary to account for living matter HB da o [Hj. The resolution to this debate 
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should inform both our understanding of life’s emergence, and what we should be looking 
for in the search for signatures of life beyond Earth. 

A weaker, and perhaps more widely accepted, perspective holds that while information is 
certainly useful in describing biological systems, ultimately all of biological complexity is, 
at least in principle, fully reducible to known physics. Under this view, although biological 
systems may appear complex, no new physical principles are needed to explain the phe¬ 
nomenon of life. By contrast, the stronger viewpoint takes the perspective that information 
is not merely descriptive, but instead is intrinsic to the operation of living systems |15) . If 
the strong viewpoint holds, life would necessarily be classified as distinct from other kinds 
of physical systems, as we know of no other class of physical system where information 
is necessary to specify its state BSIQ A convincing case for the strong viewpoint must 
quantitatively satisfy two conditions: 

(1) biological systems must be demonstrated to somehow be unique in their informational 
architecture, as compared to other classes of physical systems; and (2) information must 
be shown necessary to the execution of biological function - that is to say, information 
must be shown to matter to matter. 

The necessity of the first condition is perhaps obvious: if information is fundamental to 
biological organization, a strong signal should appear when contrasting information in liv¬ 
ing systems with the same concept of information as applied to non-living systems. The 
challenge is to define the appropriate concept of “information” in this context. The second 
criteria is less immediately obvious, but is clarified by a simple example. An informational 
pattern, such as the sequence of bases necessary to specify the GNRA tetraloop “GAGA”, 
is readily copyable to other states of matter, e.g. such as this page of text. This kind 
of pattern cannot therefore be unique to life in the sense of the strong view, as other 
states of matter can share the same informational pattern. This may be one reason why 
attempts to quantify biological complexity in terms of Shannon Entropy have been rela¬ 
tively unsuccessful (for example, it is well known that genome size, which can be correlated 
with Shannon Information content, does not readily map to organismal complexity m)- 
What, if anything, characterizes life as unique state of matter quantified by “information” 
must therefore necessarily tie information to doing something, e.g. to the causal structure 
underlying function [la H iia ng. It is for this reason that we utilize the terminology 
“informational architecture” rather than “information” or “informational pattern” herein, 
as architecture implies physical structure whereas mere patterns are not necessarily tied 
to their physical instantiation. Thus far, attempts to address the strong view have been 
primarily qualitative. 

In what follows, we utilize simple, well-studied Boolean network models for real biological 
systems and demonstrate quantitatively that the biological networks studied have distinct 
informational architecture as compared to their random network counterparts. We do so 
by utilizing information theoretic analyses, detailing how information is processed in the 


^With the exception of the trivial sense in which “information” is used to describe every physical system. 



NEW SCALING RELATION FOR INFORMATION TRANSFER IN BIOLOGICAL NETWORKS 


3 


execution of biological function. Our approach aims in part to address a research program 
laid out in a seminal paper by Nurse, wherein he calls for more focused understanding 
on flows of information within biological networks [3] . We therefore focus on the concept 
of “information flows”, interpreted as information processing and measured by Schreiber’s 
transfer entropy m as a concept of information that allows us to address conditions 1 
and 2 above. We conjecture that if the strong viewpoint holds, it is how living systems 
implement informational correlations (process information) through space and time in the 
execution of function that sets them apart from other elasses of physical systems. We 
note that while there has been a great deal of emphasis placed on understanding the 
“logic of life”, quantitative results have thus far been primarily topological or dynamical. 
Examples include analysis of the topology of network motifs - or logic circuits - necessary 
to biological function [201 El], or dynamical network features such as the robustness of 
the global attractor landscape [22]. The approach we present herein is distinct from these 
previous efforts in that we explicitly address information processed, which, as we will show, 
should be viewed as an emergent property of networks that arises from the integration of 
topology and dynamics. 


Herein, we focus on two model systems developed previously: the Boolean network model 
for the cell-cycle regulatory network of the fission yeast {S. pombe) [T] and that of the 
budding yeast {S. eerevisiae) [2]. We calculate the information transferred between pairs 
of nodes within each network in the execution of function and contrast the results with the 
same analysis performed on random networks of two different classes: Erdos-Renyi (ER) 
networks and Scale-Eree (SE) networks. The latter share a scale-free structure with the cell- 
cycle networks, which is often cited as a distinguishing feature of many real-world networks 
including biological networks [231 [211 [251 [26]. We show that both cell-cycle networks share 
commonalities in their informational architecture that set them far apart from their random 
network counterparts. One of the most striking features uncovered is a scaling relation for 
the distribution of information transfer between nodes, which for the cell-cycle networks is 
statistically different from that observed for either ensemble of random networks, despite 
the topological resemblance between the cell-cycle networks and SF networks in our study. 
We identified the regime where the cell-cycle networks differ most significantly from the 
random networks in their scaling relation, and characterized the patterns in information 
transfer relative to the function (dynamics) of each network. The results show that control 
kernel nodes, which have previously been connected with regulation of function, play an 
important role in information transfer within the cell-cycle networks. We also investigated 
how the causal structure (topology) of the cell-cycle networks affects information transfer 
as compared to the random networks and found that the scale-free structure, shared by the 
cell-cycle and SF networks, utilizes long-range correlations more than direct causal interac¬ 
tions between nodes for information processing, unlike what is observed for ER networks. 
These results are suggestive of previously unidentified information-based organizational 
principles that go beyond topological considerations, which may be critical to biological 
function. Finally, we found that both cell-cycle networks process more information than the 
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majority of random networks in either ensemble, suggestive of evolutionary optimization 
for information processing. 

We note that our analysis, being based on standard information measures, is not level 
specific, and thus the approach is expected to generalize to most, if not all, biological net¬ 
works. Therefore, we potentially offer an operational means to quantify life or to detect 
generic signatures of life, in terms of its informational properties. The results presented in 
the paper thus open a new framework for addressing the debate over the status of infor¬ 
mation in biology, by demonstrating quantitatively how the informational architecture of 
biologically evolved networks distinguishes them from other classes of network architecture 
that do not share the same informational properties. 


2. Model and Methods 

The study presented herein is a first attempt to identify patterns in information processing 
that might be distinctive of biological organization, as compared to other classes of physical 
systems, and in turn to connect these patterns to causal structure (topology) and function 
(dynamics). Our analysis requires an integrative synthesis of a number of distinct areas, 
including: Boolean network models of biological function, information theoretic analysis 
for distributed computation, sampling networks with topological constraints, and control 
of cellular behavior. We briefly describe each herein. 

2.1. Boolean Network Models for the Cell-Cycle Regulatory Process. Boolean 
network models have proven in many cases to provide accurate models for biological func¬ 
tion lIIlEElEij .They are also the most readily tractable network models for information- 
theoretic analysis, since each node may take on only one of two discrete states [301 El]. 
They are thus ideal for our case study. In this study we focus on the cell-cycle regulatory 
networks of the fission yeast S. pombe |I| and of the budding yeast S. cerevisiae |2|. The 
Boolean network models for both systems are shown in Fig. Each node corresponds to 
one protein among the small subset of key regulatory proteins involved in each respective 
cell-cycle. The state of node i, Si{t) £ {0,1}, indicates whether the given protein is present 
(1) or not (0). Biochemical causal interactions between proteins are denoted by edges in 
the network. Time advances through the cycle and the states of nodes are updated (in 
parallel) in discrete time steps according to the following rule: 


( 1 ) 


Si{t 1 ) 


1, > 6i. 

0 , YjjO-ijSjit) < Gi- 

YljO'ijSjit) = Oi- 


where aij is the edge weight between node i and j {aij = — 1 for inhibition links and aij = 1 
for activation links), and Oi is the threshold for node i. The threshold for all nodes in both 
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networks is 0, with the exception of Cdc2/13 and Cdc2/13*, which have thresholds of —0.5 
and 0.5, respectively. 

Although numerous Boolean network models for biological systems exist, we specifically fo¬ 
cus on these two networks because they are small and accurately model biological function. 
The small network size, with ~ 10 nodes each, permits statistically reliable comparison of 
results for the biological networks to the average properties of randomized networks of the 
same size (number of nodes and edges), due to the relatively small ensemble size of compa¬ 
rable random networks. Thus, for these two networks, we can readily address condition (1) 
as posed above, by making a meaningful comparison between each biological network and 
an ensemble of random networks with similar size and topological features. Connecting 
information processing and function, as laid out by criterion (2), is also tenable for both 
models. For both cell-cycle networks, there is a direct connection between the dynam¬ 
ics on these networks and the corresponding biological function: both networks correctly 
reproduce the sequences of protein states corresponding to the phases of the respective cell- 
cycle (see mm for details). Therefore, any distinctive patterns uncovered in our analysis 
of informational architecture can be related to dynamics, and consequently, the biological 
function of each network. 

We note in particular that the task of addressing condition (2) is in general not a trivial 
one. Our approach is to connect information processing to topology and dynamics, that is 
to the causal mechanisms of each biological network that define its function [32] (as we will 
show in addressing condition (1), topology alone is not sufficient to quantify information 
processing in the biological networks). Stated differently, condition (2) requires identifying 
how information transfer might depend on causal structure in non-trivial ways such that 
information processing is intrinsic to function (and therefore matters to matter). We 
note that the edges connecting nodes for both cell-cycle networks are modeled based on 
experimental data detecting a direct causal interaction between the proteins they represent 
mm- Comparing information transfer between nodes connected by an edge to that between 
nodes that are not directly interacting therefore provides insights into how correlations are 
distributed amongst nodes in the execution of function. 

Our analysis focuses on a few key nodes of each cell cycle network, called the control kernel, 
that regulate the dynamics of the entire network. These are highlighted in red in Fig. 
Control kernels were recently discovered by Kim et al in a number of biological networks, 
and seem to be a generic feature of Boolean models for regulatory networks |33| . A control 
kernel is defined as the minimal set of nodes such that pinning their value to that of the 
primary attractor associated with biological function guarantees the convergence of the 
network state to that attractor. This property led Kim et al to conclude that the control 
kernel acts as a local mechanism for regulating the global behavior the network. To address 
condition (2) we study how information transfer is related to the causal mechanisms of both 
cell cycles by determining the distribution of information transfer among pairs of nodes 
with a causal connection and without, and more specifically we analyze information transfer 
through control kernel nodes that regulate function. 
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We note that despite the fact that budding and fission yeast are closely related genetically 
|34j . closely related genes between the two organisms can play vastly different functional 
roles m- Also, while the two networks share similar overall dynamics in terms of the 
dominating largest attractor mm, they show significant differences in their underlying 
biochemical machinery [36]. Thus, for the purposes of our study, we view them as two 
independent examples of biological networks with related function. Studying both net¬ 
works in parallel provides a comparative analysis to look for features common to biological 
networks that are not shared by their random counterparts. 

2.2. Construction of Random Boolean Networks with Biological Constraints. To 

identify any features distinctive to biologically evolved networks that might satisfy criteria 
(1) or (2), we compare the results of our information-theoretic analysis on both cell-cycle 
networks to the same analysis performed on ensembles of random networks. Scale-free 
structure is ubiquitous in real-world networks from the Internet to social and biochemical 
networks. Scale free structure occurs in cases where there exist hub nodes with signifi¬ 
cantly more connections than others, and where the distribution of the number of edges 
for each node follows a power-law function over the whole network. Many features of bi¬ 
ological networks (e.g. robustness to random failure) [23], as well as others classified as 
scale-free networks, have been explained as arising from the scale-free structure. Here, we 
are in particular interested in going beyond a scale-free structure and addressing whether 
information processing (which we attribute to topology and dynamics) is a more distinc¬ 
tive feature of biological networks than global topology (scale-free structure) alone. We 
therefore compare our results for the biologically functional cell-cycle networks to two dif¬ 
ferent classes of random networks that act as controls for analyzing properties distinctive 
of biological networks; Erdos-Renyi (ER) networks and Scale-Eree (SE) networks. 

Eor meaningful comparison, both classes of random networks were constructed under con¬ 
straints with reference to each cell-cycle network (see Table [^. ER network here indicates 
networks sampled networks using an Erdos-Renyi random graph model, where every pair of 
nodes is connected to an edge with a fixed probability [371 EH] • Therefore, we use the term 
“ER networks” to distinguish them from another class of randomly sampled network (SE) 
utilized in this study. We note our ER networks are equivalent to the class of networks 
commonly referred to as “random networks” in other literature. In our study, the term 
Scale-Eree network, unlike its common definition, does not mean that sample networks in 
the ensemble exhibit power-law degree distributions. Due to their small size, even the de¬ 
gree distributions of the biological cell-cycle networks do not follow a power-law. Instead, 
the term, “scale-free” as used in this paper, emphasizes that the sample networks have the 
same exact degree sequence, defined as the number of edges for each node over the whole 
network as each cell-cycle network (see Table [^, and hence the networks share the same 
bias in terms of global structure as the cell-cycle networks. In this paper, the SE networks 
are generated by edge-swapping from the reference cell-cycle network |39) (for a more gen¬ 
eral method, see [IQI HD US]). We note that since having the same degree sequence is a 
sufficient condition for sharing a degree distribution, the analogous randomly generated 
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networks with reference to larger biological networks that are truly scale-free would also 
be scale-free. Therefore, the comparison of the cell-cycle network to both class of random 
networks allows us to isolate the contribution attributable to global topological features 
such as degree distribution, which the biological networks share with the SF networks but 
not the ER networks in our sample, from any informational structure that may arise solely 
due to network architecture which is peculiar to biological function. 

2.3. Quantifying Information Processing. Our information-theoretic analysis focuses 
on quantifying “information processing” in the biological and random networks in our 
study. Our motivation is to capture Nurse’s notion of “information flows”[^as a concept of 
information relevant to biological function [3] . We adopt the notion of information process¬ 
ing as implemented in information dynamics, a formalism for quantifying the component 
operations of computation in dynamical systems, utilizing the tools of information theory. 
In information dynamics, information processing is quantihed using Schreiber’s transfer 
entropy [H [30l [Ml US], a directional measure of information transfer. Transfer entropy 
(TE) from a source node E to a target node X is defined as the reduction in uncertainty 
due to knowledge of state of Y about the future state of X, above the reduction in uncer¬ 
tainty provided by knowledge about the past states of X. TE from E to X can be written 
as: 


(2) TY^x{k)= p{xP,X„Yl,yn) log2 . 

where indicates the set of all possible patterns of sets of states [Xn \xn-\-i^yn) and Xn 
denotes {xn, ■■■, Xn-k+i), the vector of k previous states of destination X at time-step n-|-l. 
Also, yn and Xn+i represent the state of E at the current time step and the state of X 
at the next time step. The probability distributions in Eq. are defined as the relative 
frequency of each pattern of states over the times series of dynamical states of the network. 
In our study, to obtain the probability distributions for each cell cycle network and the 
random network ensembles, we generated every possible trajectory for each network by 
applying the updating rule in Eq. up through 20 time steps for all the 2” possible initial 
network states (i.e. all possible combinations of binary states for nodes in the network), 
where n is the total number of nodes in the particular network under study. We chose 
20 time steps as the length of the trajectory generated from each initial state, since it is 
sufficiently long to capture transient dynamics for networks before converging on a fixed 
point (an attractor) for the cell-cycle and for the vast majority of random networks. 

^Herein we use the terms “flow”, “transfer” and “processing” interchangeably in reference to information, 
and quantify this concept of information using Schreiber’s transfer entropy [19]. This is more informal 
language than the technical meaning of “information flow” as formulated by Ay and Polani, which is a 
measure of causal flows |43| . Since we do not implement the Ay and Polani measure herein we use “flow” 
to directly connect our quantitative results with the notion of “information flow” as introduced by Nurse 
(which does not necessarily imply direct causal interaction). 
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From Eq. one can see that TE is the mutual information between and Xn+i condi¬ 
tioned on k previous states of X. Directionality arises due to the asymmetry between time 
steps for the state of the source and the destination in Eq[^ Due to this asymmetry, TE 
can be utilized to measure “information flows”. It is therefore more appropriate for our 
purposes of quantifying distinct features of information processing in biological networks 
than related measures such as mutual information, which measure correlations only, with¬ 
out reference to directionality. We note that TE is correlational and does not necessarily 
reflect direct causal interactions between pairs of nodes, but may take on non-zero values 
even for pairs of nodes that do not directly interact |44) . The causal structure for the 


networks studied herein is fully described by their edges as noted in Section 2.1 


3. Results 


3.1. Scaling Relation for Information Transfer in Yeast Cell-Cycle Networks. 

To reiterate, we use transfer entropy (TE) as a candidate measure for identifying features 
potentially distinctive to biological networks, focusing our analysis on the concept of “in¬ 
formation flows” that may be particular to biological function as suggested by Nurse [H]. 
Eor every ordered pair of nodes (i,j) in both cell cycle networks, we calculated TE from 
node i to node j, Ti^j (as described in Section 2.3) and ranked the pair according 
to its measured value of TE. The same analysis was performed on each network in the 
ensembles of ER and SF networks as a point for comparison to identify any features par¬ 
ticular to the biological networks. The resulting scaling relations are shown in Fig. 
where biological networks are highlighted in red for fission yeast (top) and budding yeast 
(bottom). The ensemble averages for the scaling relations of ER and SE random networks 
generated with reference to each cell cycle are shown in green and blue, respectively, in 
each respective panel. Error bars represent the standard deviation over the ensembles of 
random networks (averages are over 1,000 random networks). For the results shown in 
Fig. [2] the history lengths k = 2 and k = 5 were selected since these history lengths show 
the most distinctive scaling distribution as compared to the two random network ensem¬ 
bles for the fission and budding yeast networks, respectively (see Supplement for scaling 
relations over history lengths k = 1, ..., 8). The scaling distributions reveal a non-linear 
relationship between the information transferred between pairs of nodes (y-axis) and their 
relative rank (x-axis), for each of the network classes studied -- biological, SF and ER. 
The scaling relation is most striking for the biological networks (red), which are significant 
outliers with respect to either of the ER or SE ensembles. 


Fig. [2] shows that ER networks have much less information transfer than SF or biological 
networks. This result suggests that scale-free structure - as is the case for the SF and 
biological networks, but not the ER networks - plays as significant role in increasing infor¬ 
mation transfer within a network. The deviation between ER networks and SF networks 
or the cell-cycle networks may be expected due to the differences in topological features, 
which the transfer entropy is partially based on. However, surprisingly, scale-free struc¬ 
ture alone does not account for the high level of information transfer between nodes in 
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the biological networks. The biological networks differ from the SF networks despite their 
common topological features. With the exception of the few highest ranked node pairs, the 
biological networks exhibit information transfer that is several standard deviations larger 
than the corresponding rank in the SF ensemble for the majority of ranks with TE ^ 0. 
The excess TE observed in the biological networks in Fig. deviates between la to 5a 
from that of the SF networks, with a trend of increasing divergence from the SF ensemble 
for lower ranked node pairs that still exhibit correlations {e.g., where TE > 0). We define 
X as the set of node pairs whose TE deviates > 2a from the SE networks, indexed by 
rank. The ranks that deviate more than 2a for the fission yeast are Xf = {9) 10... 30} 
and for budding yeast Xb = {31,32.. .68} (highlighted between the dashed lines in each 
panel in Eig[^. Patterns in the biological distribution also exhibit plateaus, suggestive 
that there exist subgroups wherein informational flow is evenly distributed, as opposed to 


a few dominating informational connections. We analyze this substructure in Sections 3.4 
in terms of the dynamics (function) of the cell cycle networks. 


3.2. Total Information Processed. We also calculated the total information processed 
by the cell-cycle networks and compared to that of individual instances of SF and ER 
networks within the random ensembles. We define a quantity called total information 
processed, denoted by Ip, which is the sum of the transfer entropy between all pairs of 
nodes (i, j) for an individual network. Ip = Yl{ij) ^ given history length k { k = 2 

and A; = 5 for the fission and the budding yeast, respectively). Accordingly, the total 
information processed by the fission and budding yeast cell-cycle networks is Ip = 8.09 and 
Ip = 3.51, respectively - shown as red lines in the Eig. The frequency distributions of 
networks associated with Ip for the two sets of SE and ER networks are shown in Fig. 
Comparing the two distributions, it is evident that, on average, SF networks process more 
information than ER networks, with higher frequencies for networks with larger values of 
Ip. Fig. i also shows that the biological networks process more information than most 
random networks in the ensembles. More specifically, the fission yeast cell-cycle network 
lies outside of 95% of the SF networks and 100% of the ER networks. Similarly, the 
budding yeast cell-cycle network is in the 95.1% percentile of SF networks and the 99.5% 
percentile of ER networks. This result indicates that the cell-cycle networks are highly 
optimized, with only 1/2000 ER networks and 1/200 SF networks displaying comparable 
total information processed in a random draw. 


3.3. Distribution of Information Processing over Causal Structure. The correla¬ 
tions measured with TE can arise due to direct causal effect {e.g. via an edge) or statistical 
correlations between two nodes that are not directly connected via a causal interaction {e.g. 
via long range correlations). We note that since we are using transfer entropy in our analy¬ 
sis, the measured correlations are across space (between nodes) and time (between discrete 
time steps)- they therefore differ from correlations typically associated with critical phe¬ 
nomena as applied to networks, which are usually strictly spatial {e.g. such as mutual 
information). 
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To identify whether the distinctive features of the cell-cycle networks shown in Fig. 
and Fig. arise due to information transfer along edges, or longer-range correlations, 
we classified each pair of nodes as either having a direct causal interaction (connected by 
an edge) or not (no edge). We also classified them as being correlated {TE > 0) or not 
[TE = 0). The results of this classification scheme are shown in Table and are very 
similar for both cell cycle networks. In both cases, the majority (> 40%) of node pairs 
are correlated via information transfer, even though they are not causally connected by an 
edge. Roughly 25% of node pairs have a causal interaction with information transfer, or no 
interaction and no information transfer. The remaining minority of nodes exhibit a causal 
interaction with no corresponding transfer of information (causation without correlation). 
The same analysis was applied to SF and ER networks corresponding to each cell-cycle, 
shown in Fig. (see Supplement for detailed data). Fig. shows a distinctive transition 
in the distribution of correlations moving from the ER to SE networks ensembles. In the 
ER networks, the majority of node pairs that are not connected by an edge are also not 
correlated. By contrast, the majority of node pairs in SE networks are correlated even 
though they are not directly connected, and this pattern is even more prominent for the 
biological cell-cycle networks. Biological networks therefore appear to be highly optimized 
for correlations among nodes, even in cases where there is no direct causal interaction. 

3.4. Information Transfer and Regulation of Biological Function. We analyzed 
the scaling relations for the cell-cycle networks in Fig. with particular focus on the 
biologically distinct regime X) iii terms of the global causal structure of each biological 
network. While the local causal structure of these networks are fully articulated by the 
edges shown in Fig. we view the global causal structure as embedded in the relationship 
between control kernel nodes and the flow of dynamics to the associated attractors in 
network state space, as discussed in Section [2T] (see also discussion in [46]i. 

Here we are interested in how the control kernel nodes - as drivers of global dynamics and 
functional regulators - play a role in information transfer. To do so, we divided all nodes 
in each cell-cycle network into two groups: CK and NCK, where CK denotes the set of 
control kernel nodes identified in j33j and NCK is the complement of CK, i.e. non-control 
kernel nodes. Therefore, an ordered pair of nodes (i,j) in the networks falls in one of 
four groups shown in Table depending on whether i or j belongs to CK or NCK. We 
specified the groups for each node pair in the scaling patterns for information transfer for 
both biological networks (red in Fig. [^, as shown in Fig. 

The highest ranked regimes of the cell cycle scaling relations shown in Fig. where the 
biological networks differ least from the scale-free networks, are dominated by information 
transfer between NCK nodes (NCK —)• NCK, shown in aqua), see Fig. The most bio¬ 
logically distinctive regime (x/ and Xfe) is, by contrast, dominated by information transfer 
between CK nodes and NCK nodes, i.e. CK —)■ NCK (purple) and NCK —)• CK (orange) 
(see Supplement Table 1). For the biological networks the scaling regime that deviates 
most from the SF networks is dominated by information transfer through the control ker¬ 
nel nodes. This suggests that for both cell cycle networks, the patterns in information 
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processing observed to be most distinct to the biological networks are strongly affected by 
the presence of the control kernel, and hence must be associated with the regulation of 
function. 


4. Conclusions 

It is an open question whether the concept of information as applied to biological systems 
is merely a useful conceptual metaphor or hints at deeper physical principles underlying 
biological organization. Support for the viewpoint that information is not merely descrip¬ 
tive, but instead integral to the operation of living systems (strong view) requires that 
at least two conditions are satisfied: (1) that biological systems must be demonstrated to 
somehow be unique in their informational architecture, as compared to other classes of 
physical systems; and, (2) that information must be shown to be intrinsic to operation of 
biological systems, e.g. to the execution of biological function. Our analyses presented here 
provide first quantitative data addressing these conditions, with results lending support to 
the view that biological systems are distinctive in their informational architecture. 

Our results indicate that scale-free structure - characterizing the biological and SF networks 
in our study, but not the ER networks - plays as significant role in increasing information 
transfer within a network. This result is particularly interesting when considered within 
the context of the wide-spread observations of scale-free structure in various biological 
networks [231 EH EH [26]. The ubiquity of scale-free networks is prima facie explained as 
a result of their robustness properties. However, the high levels of information transfer in 
SF networks as compared to ER networks uncovered in this study indicates an alternative 
explanatory hypothesis may also hold true - that is, that SF networks arise as a result of 
their amplified information processing. 

Significantly, the enhancement of SF networks over ER networks is not sufficient to explain 
the information transfer observed for the biological networks. Condition (1) is therefore 
satisfied by the statistically significant difference between “information flows” of biological 
networks and that of ensembles of ER and SF networks: the scale-free structure taken alone 
is not sufficient to explain the distribution of information processing for either biological 
network. This result has important implications for modeling of biological function, which 
has thus far primarily focused on topological or dynamic properties. In particular, topo¬ 
logical features of scale-free networks - such as power-law degree distribution - are often 
viewed as sufficient to capture essential features of biological organization due precisely to 
the fact that they are observed in a number of disparate biological phenomena |23[ (26] . Our 
results indicate that an additional criterion for accurately modeling biological function, is 
that network models include the dynamics of information transfer in their construction, 
and in particular that models should optimize information transfer between nodes, since 
this feature distinguishes the biological networks in our study from generic SF networks. 
The newly discovered informational scaling relation is an emergent property of networks 
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that arises from the integration of topology and dynamics, which cannot he accounted for 
solely by one of these two features. 

Additionally, the biological networks in our study are outliers in terms of the total in¬ 
formation processed, as quantified by Ip, in each network in our study. This concept of 
information (information processing) satisfies the condition of being not readily copyable to 
other states of matter, since it is the physical instantiation (causal mechanisms) that give 
rise to the observed patterns in information flows, and thus is a candidate for satisfying 
Condition (2). Our analysis also in part directly addresses condition (2), by connecting 
the observed distinctive features of the informational architecture of biological networks 
in Fig. 2 to their causal structure and their function. The enhancement of information 
transfer between nodes for biological networks, as compared to SF or ER, is primarily due 
to correlations between nodes which are not directly causally connected (non-local) - a 
frequent signature of collective, or critical states of organization. Typically, criticality is 
described in terms of long-range correlations in space. Herein, we have shown that for the 
cell cycle networks, correlations are in space and time, i.e., are associated with information 
processing. Interestingly, both networks studied have very similar patterns for the distribu¬ 
tion of TE among causally connected node pairs (Table 2). It is an open question whether 
this pattern is indicative of cell-cycle function, or a more general pattern of biological or¬ 
ganization that might be characteristic of networks with other functions. In particular, 
since the cell cycle is optimized for processing information through time as a mechanism 
for keeping track of phases during cellular division, a question of interest is whether other 
networks, optimized for processing spatial information (for example, the genetic regulation 
of embryo development m), exhibit the same or different informational patterns in their 
distribution of correlations among causal edges. 

An important feature of both networks in this study, which is shared with other regulatory 
networks, is the control kernel, defined as a subset of nodes that regulate the attractor 
dynamics associated with biological function. Our analysis has revealed that most of the 
biologically distinct regimes of the information transfer scaling relations, Xf Xb for 
the fission and budding yeast respectively, is attributable to the presence of the control 
kernel. Information transfer in or out of the control kernel dominates the biologically 
distinct regime for both networks. Furthermore, for the budding yeast network, the ranks 
deviating most from random are primarily those corresponding to information transfer 
between control kernel nodes. Although not conclusive that information is intrinsic to 
function (e.g. that condition (2) holds), our results clearly indicate that the patterns in 
information processing unique to the biological networks in our study are attributable 
to the regulation of function by a few key nodes. Interestingly, these nodes have other 
properties consistent with their information theoretic interpretation: in particular, the 
control kernel provides a mechanism for distinguishability among attractor states for the 
biological networks. As noted by Kim et ah, the set of control kernel nodes takes on a 
unique and distinct state in every attractor state in the networks studied |33j. They thus 
provide a means for coarse-graining the state space in a functionally relevant manner (such 
that the primary attractor associated with function is distinguishable from other attractor 
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states). That the network organizes information flows through these nodes as it executes 
dynamics of the cell cycle is a highly non-trivial feature of both networks’ informational 
structure. We also note that the control kernel nodes were effectively discovered by a causal 
intervention in the manner of Pearl [l8], i.e. by fixing the subset of nodes to their value 
in the biologically relevant attractor. Combined with our results, this suggests interesting 
connections between information transfer and top-down causal regulation [39] of biological 
function, discussed in more depth in [36] . 

We hypothesize that the features reported herein may be common to biological networks of 
different function, and in particular, that scaling relations in information transfer may be 
a hallmark of biological organization. Our results are suggestive of previously unidentified 
information-based organizational principles that go beyond topological considerations such 
as scale-free structure, and may be critical to biological function. They thus open a new 
framework for addressing the debate over the status of information in biology, by demon¬ 
strating quantitatively how the informational architecture of biologically evolved networks 
can distinguish them from other classes of network architecture that do not exhibit the 
same informational properties as reported here. 
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Erdos-Renyi (ER) networks 

Scale-Free (SF) networks 

Size of network 
(Total number 
of nodes, 
inhibition and 
activation 
links) 

Same as the cell-cycle network 

Same as the cell-cycle network 

Nodes with a 
self-loop 

Same as the cell-cycle network 

Same as the cell-cycle network 

Threshold for 
each node 

Same as the cell-cycle network 

Same as the cell-cycle network 

The number of 
activation and 
inhibition links 
for each node 

NOT the same as the cell-cycle 
network (—)• no structural bias) 

Same as the cell-cycle network 
( —)■ scale-free structure) 


Table 1 . Constraints for constructing two different classes of random net¬ 
works that retain features of the causal structure of a reference cell-cycle 
network. 



Edge 

No Edge 

TE > 0 

23.46% 

43.21% 

TE = 0 

7.41% 

25.93% 



Edge 

No Edge 

TE > 0 

22.31% 

48.76% 

TE = 0 

5.79% 

23.14% 


(a) Fission Yeast (b) Budding Yeast 

Table 2. The distribution of TE within the cell-cycle networks and corre¬ 
sponding SF and ER networks, classified by pairs of nodes that are corre¬ 
lated {TE > 0) or not {TE = 0) and causally interacting (edge) or not (no 
edge). The values indicate the ratio of the number of node pairs in each 
category to the total number of node pairs for each cell-cycle network. 
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Table 3. Possible information flows between sets of nodes classified as 
control kernel (CK) (highlighted in red in Fig. or non-control kernel 
(NCK). 













18 


HYUNJU KIMi, PAUL DAVIES^, AND SARA IMARI WALKERL2 



Figure 1. Boolean network models of fission (left) and budding (right) 
yeast cell-cycle regulation. Nodes represent the regulatory proteins and 
edges denote two types of biochemical interactions between nodes: activa¬ 
tion (ended with an arrow) and inhibition (ended with a bar). The nodes 
colored red are the control kernel, which regulates the global behavior of 
each network when pinned to specific values. Figure adopted from |33) 





NEW SCALING RELATION FOR INFORMATION TRANSFER IN BIOLOGICAL NETWORKS 


19 




Figure 2. Scaling of information processing (TE) among pairs of nodes 
for cell-cycle (red), ER (green) and SF (blue) networks. History lengths for 
computing TE were k = 2 and k = 5 for the hssion and the budding yeast, 
respectively. The averages and the standard deviation of for each of the 
random networks ensembles are computed for a sample of 1000 networks. 
Regions between dashed lines denote x for each cell-cycle network. 
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(a) Fission Yeast 



Ip (bits) 

(b) Budding Yeast 

Figure 3. Distributions for total information processed (sum of TE for all 
pairs of nodes) for the ensembles of ER (green) and SF (blue) networks. 
Each data point represents the number (y-axis) of individual networks 
within the respective ensemble with a given amount of total information 
transferred (x-axis). Also red line indicates the total information processed 
for the fission (Upper) and budding (Lower) yeast cell-cycle network. 
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Figure 4. Classification of all pairs of nodes within the cell-cycle (red), 
SF (blue) and ER (green) networks by correlation (TE > 0 oi TE = 0) and 
causal interaction (edge or no edge). Each data bar indicates the percentage 
of node pairs (on Y-axis) within each category. 
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Figure 5. Scaling relations for information transfer for the fission yeast 
(top) and budding yeast (bottom) cell-cycle networks. Data shown is the 
same as in Fig. with data pointes divided into four classes of informa¬ 
tion transfer: CK —>■ CK (yellow), CK —)• NCK (purple), NCK —>■ CK 
(orange), and NCK —)• NCK (aqua). 













